5 research outputs found

    Impact Factor: outdated artefact or stepping-stone to journal certification?

    Full text link
    A review of Garfield's journal impact factor and its specific implementation as the Thomson Reuters Impact Factor reveals several weaknesses in this commonly-used indicator of journal standing. Key limitations include the mismatch between citing and cited documents, the deceptive display of three decimals that belies the real precision, and the absence of confidence intervals. These are minor issues that are easily amended and should be corrected, but more substantive improvements are needed. There are indications that the scientific community seeks and needs better certification of journal procedures to improve the quality of published science. Comprehensive certification of editorial and review procedures could help ensure adequate procedures to detect duplicate and fraudulent submissions.Comment: 25 pages, 12 figures, 6 table

    Open-Set Classification for Automated Genre Identification

    No full text
    Abstract. Automated Genre Identification (AGI) of web pages is a problem of increasing importance since web genre (e.g. blog, news, e-shops, etc.) information can enhance modern Information Retrieval (IR) systems. The state-of-the-art in this field considers AGI as a closed-set classification problem where a variety of web page representation and machine learning models have intensively studied. In this paper, we study AGI as an open-set classification problem which better formulates the real world conditions of exploiting AGI in practice. Focusing on the use of content information, different text representation methods (words and character n-grams) are tested. Moreover, two classification methods are examined, one-class SVM learners, used as a baseline, and an ensemble of classifiers based on random feature subspacing, originally proposed for author identification. It is demonstrated that very high precision can be achieved in open-set AGI while recall remains relatively high

    Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance

    No full text
    Automatic plagiarism detection considering a reference corpus compares a suspicious text to a set of original documents in order to relate the plagiarised fragments to their potential source. Publications on this task often assume that the search space (the set of reference documents) is a narrow set where any search strategy will produce a good output in a short time. However, this is not always true. Reference corpora are often composed of a big set of original documents where a simple exhaustive search strategy becomes practically impossible. Before carrying out an exhaustive search, it is necessary to reduce the search space, represented by the documents in the reference corpus, as much as possible. Our experiments with the METER corpus show that a previous search space reduction stage, based on the Kullback- Leibler symmetric distance, reduces the search process time dramatically. Additionally, it improves the Precision and Recall obtained by a search strategy based on the exhaustive comparison of word n-grams. \ua9 Springer-Verlag Berlin Heidelberg 2009

    Using sentence embedding for cross-language plagiarism detection

    No full text
    The growth of textual content in various languages and the advancement of automatic translation systems has led to an increase of cases of translated plagiarism. When a text is translated into another language, word order will change and words may be substituted by synonyms, and as a result detection will be more challenging. The purpose of this paper is to introduce a new technique for English-Arabic cross-language plagiarism detection. This method combines word embedding, term weighting techniques, and universal sentence encoder models, in order to improve detection of sentence similarity. The proposed model has been evaluated based on English-Arabic cross-lingual datasets, and experimental results show improved performance when compared with other Arabic-English cross-lingual evaluation methods presented at SemEval-2017
    corecore